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£NJ ■ Abstract 



We consider the problem of estimating functions of distributed data using a distributed algorithm 
over a network. The extant literature on computing functions in distributed networks such as wired 
and wireless sensor networks and peer-to-peer networks deals with computing linear functions of the 

■ distributed data when the alphabet size of the data values is small, O(l). 
£N) 1 We describe a distributed randomized algorithm to estimate a class of non-linear functions of the 

distributed data which is over a large alphabet. We consider three types of networks: point-to-point 
networks with gossip based communication, random planar networks in the connectivity regime and 
random planar networks in the percolating regime both of which use the slotted Aloha communication 
protocol. For each network type, we estimate the scaled fc-th frequency moments, for k > 2. For 
every k > 2, we give a distributed randomized algorithm that computes, with probability (1 — S), 
an e-approximation of the scaled fc-th frequency moment, Ft/N k , using time 0(M 1 ~ k ~ 1 T) and 
0(M 1_ k ~ 1 log N log(<S _1 )/e 2 ) bits of transmission per communication step. Here, N is the number 
of nodes in the network, T is the information spreading time and M — o(N) is the alphabet size. 

■ Keywords: In-network computing, frequency moments, randomized algorithms 

^ ■ 1 Introduction 

p; 

We consider the problem of distributed computation of the fc-th frequency moment (k > 2) of data that is 
distributed over a network. We assume that there are N nodes in the network and each node holds a number 
Xi from a large alphabet set A := {1, . . . , M}, where M = o(N). If N m is the number of times m E A 

M 

appears in the network, then the /c-th frequency moment of the data is defined as := (N m ) . The 

m— 1 



frequency moments are an important statistic of the input data. Fq is the number of distinct elements in the 



data, Fi is the size of the data. F2, also known as Gini's index or the 'surprise index', is a measure of the 
dispersion in the data. More generally, for k > 2, the frequency moments are an indication of the skewness 
of the data: Fk/N k = 1 indicates a highly skewed data and Fk/N k = 1/M 1 corresponds to a uniform 
distribution of the data. Our interest is the case of k > 2 for which Fk/N k is in the range [1/M k ~ 1 , 1]. 

The estimation of the frequency moments has played a central role in designing algorithms for database 
management systems. Many algorithms for estimating the frequency moments have been considered in 
the past. For a detailed survey of the literature we point the reader to [16]. In this literature, the main 
assumption is that the data is being processed by a single processor. The processor gets a small snap-shot 
of the data at any given time and it revisits the data very few times. The primary focus of the known 
algorithms such as those of [1,6, 8] is to reduce the space needed to estimate the frequency moments. This 
is important because today the data size is massive while the amount of space available to process them is 
comparatively small. 

In this work, we focus on a setting that is different than that of [1,6, 8]. We consider the model 
in which the data is distributed among many processors. (The terms processor and node will be used 
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interchangeably.) We consider the case in which each processor holds exactly one element of the data and 
the processors form a communication network. The rules governing the communication among the nodes 
are fixed. Therefore, the algorithm must work against the given network topology, the given properties of 
the network, and the given rules of communication in the network. As the data is distributed, the parameters 
of interest are (a) the number of bits transmitted per node, and (b) the amount of time needed to compute 
the estimates of the frequency moments at all the nodes or at a designated node. Our algorithms optimize 
both the parameters simultaneously. 

There are several networks where the algorithms like those of [1,6, 8] can be used directly. As an 
example, consider the case in which each node has a unique identifier and it can broadcast its data to every 
other node in the network. In this case, the task is easy. The algorithm designer can assign the role of a 
leader to one of the nodes and assign one slot for each node to transmit. The nodes then broadcast their data 
during the assigned slot. The leader receives the data broadcast by other nodes as a stream of data. This 
is identical to the situation of a unique processor and a massive dataset. The application of the algorithms 
of [1,6,8] is now obvious. Two network characteristics complicate matters — (1) nodes do not have a global 
identifier, e.g., point-to-point networks with gossip based communication (like those considered in [3, 13]) 
and structure-free wireless sensor networks with slotted Aloha based communication (like that in [12]) and, 
(2) nodes form a multi hop network (like in [3, 12, 13]) possibly with a fraction of the nodes not being a 
part of the main connected component (like in [18]). In this paper we consider the following three settings 
that have these two characteristics and develop randomized algorithms to obtain estimate of Fk /N k . 

• Point-to-point network with gossip based communication: Here every node in the network knows its 
neighbors and can only communicate with them. The network is assumed to form a single connected 
component. At the end of the computation, each node is required to know the value of the function. 
Many recent works have considered this setting in which communicating pairs are chosen randomly 
at each time step; time steps are generated by a Poisson clock. See for example [2, 3, 13]. We will 
refer to these as gossip networks. 

• Random planar radio networks (RPRN) with slotted Aloha communication: The nodes are randomly 
distributed in the unit square. Each node broadcasts its data and all nodes within the transmission 
range of it receive this broadcast data. The efficiency of these networks is determined by the spatial 
reuse factor which is inversely proportional to the square of the transmission range. Thus we want 
the transmission range to be as small as possible. However, if it is too small, the network will be 
disconnected and computing a global function will be impossible. From [11], we know that the 
smallest transmission range for which the network will be a single connected component with high 



probability is r(N) = 6 ^yjln N/N J . This setting of the network (i.e., radius set to r(N)) is 

referred to as the connectivity regime and we will call them connected RPRNs. This regime for 
function computation has been studied in [7, 10, 12]. For networks in which the nodes have a global 
identity, like in the models of [10], a trivial extension to the algorithms for type sensitive functions 
(as in [9]) can be used. 

• Percolating RPRNs with slotted Aloha communication: This is similar to the preceding setting ex- 
cept that the transmission range is smaller and is chosen to produce a single giant component in 
the network rather than a single connected component. In this regime the network will have several 
smaller components in addition to the giant component. Computation will be performed in the giant 
connected component. We will call this setting percolating RPRNs. Since this component does not 
contain a constant fraction of the nodes, there is data loss and the computation is necessarily approx- 
imate. The quality of this approximation can be controlled by a suitable choice of the transmission 

range which will be 6 ( 1 /VN \ . Such a setting has been considered recently in [18]. 



In networks without global identifiers, a straightforward randomized algorithm to compute any function 
is as follows. Let each node pick a number independently and randomly from a large enough range (say 
4N 3 ). By a union bound, each node will have a unique identifier with high probability. Now any algorithm 
which works with node identifiers will work. Therefore, for point-to-point (random planar) networks, we 
will be able to design a randomized algorithm which transmits 0(N 3 log M) bits per processor. We do not 
know any obvious technique to reduce the number of bits transmitted per processor. However, our goal is 
to design algorithms that transmit o(N) bits per processor. 

As pointed out by [10], and to the best of our knowledge, all of the literature on in-network function 
computation aims to compute linear functions of the data distributed over the network such as the sum of 
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the data values or average of the data values. Also, they work for a small input alphabet, i.e., M = O(l). 
We break away from both these restrictions. 

Our Contributions: 

• We give algorithms to estimate scaled frequency moments in the three types of networks listed above. 
To the best of our knowledge, this is the first work which estimates a class of non-linear functions in 
such networks. 

• We also get rid of the standard restriction that M = 0(1). We allow M — > oo. The only constraint 
we have onMisM = 

We achieve this by using two techniques — (1) sketching which is a standard tool in many randomized 
algorithms (e.g., [14, 15]), and (2) exponential random variables, which were first introduced in distributed 
computing by [4] and later used by many other works including those on gossip based computation (e.g., 
[13]). 

Intuitively, the technique of sketching reduces the problem of size M — o(N) to that of M = O(l). 
This alone does not suffice. We observe that the existing sketching algorithms for computing frequency 
moments have some additional properties, which help us compose exponential random variables with the 
random maps used for sketching. These two maps give a small set of random variables. We analyze the 
properties of these random variables to finally obtain our results. The main theorems in our paper can be 
stated as follows: 

Theorem 1. For all constants e, S € (0, 1), there exist r\,r 2 — poly(e~ 2 , log <5 _1 ) such that there is 
a randomized algorithm that runs in time 0(T), uses 0(r \r 2 log N) bits of transmission per step, and 
computes an estimate of jfi, say f 2 , such that P [I/2 — > e] < 5. Here, T is equal to T\ for gossip 
networks and T 2 for connected RPRNs. 

Theorem 2. For all k > 3 and for all constants e,i £ (0, 1), there exist ri,r 2 = poly(e~ 2 , log<5 _1 ), 
B ■ s\ = 0(M such that there is a randomized algorithm that runs in time 0(Bs\T), uses 

0(r\r 2 logiV) bits of transmission per step and computes an estimate ofjfir, say fk, such that 
P [\fk — jfk\ > e] < S. Here, T is equal to T\ for gossip networks and T 2 for connected RPRNs. 

2 Preliminaries 

In this section, we formalize many of the notions described in the preceding section. We also list a few 
known definitions and theorems which we will use in the subsequent sections. 

2.1 The model 

We assume that there are N nodes in the network and the value of N is known to all the nodes. Let x u e A 
be the data at node u. Without loss of generality, we assume that A = [1, . . . , M}. We further assume 
M = o(N) and define x :— (xi 1 x 2l ■ ■ ■ xn)- As we mentioned earlier, we consider three different types 
of network models. 

Assume that the computation starts at time 0. At any time t > 0, each node would have an intermediate 
function that is determined by a subset of the nodes in the network. Let f t (t) denote this function at node 
i at time t. 

• Gossip networks: Here the nodes know their neighbors, but not the entire network topology; the 
nodes do not have global identifiers. The communication model is as follows. There is a global 
Poisson clock ticking at rate N per unit time. A random communication and a corresponding com- 
putation event is scheduled at each tick of the Poisson clock. At each clock tick, a node is selected 
uniformly at random from among the N nodes and the node performs a communication and a com- 
putation operation with randomly selected neighbor. The number of bits to be exchanged in each 
communication and computation operation will be determined in Section 3.2. The time for the algo- 
rithm to complete does not include the time to exchange data. This communication model is called 
the gossip mechanism. The goal is to compute an estimate of the function value at each node using 
such gossip communication model. This model is fairly well known and is described in detail in, 
among others, [3, 19]. 
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Let S u (t) be the set of nodes that have the data x u and/or used it to compute their function at time t, 
i.e., 

S u (t) := {v : node v has the value x u and/or used x u to compute f v (t)} 

In a gossip algorithm at a clock tick at time t, if edge (u, v) is chosen, then the data of the nodes 
that have been used to compute f u (t~) (which is the function value at u immediately before time 
t) would now determine f v (t) and likewise for f v (t). If v G S u (t), then we say that v has heard u 
before time t. The information spreading time is defined as, 

Ti := inf{i : P [u^ =1 {|5„(i)| + N}] < 

Here the probability is over the randomness of the communication algorithm. In other words, Ti is 
the minimum time required so that the event "every node has heard every other node" has occurred 
with probability at least (1 — As shown in [13] Ti depends on total number of nodes in the 
network and also on how well the network is connected. Specifically, 



*(P) 

where, P is the adjacency matrix of the graph and $(P) is the conductance of P. 

• Connected random planar radio networks: In this case nodes are deployed randomly in a unit square 
and a graph is formed by constructing edges between all pairs of nodes which are at most r(N) 
distance apart. Transmission of a node u is received by all the nodes v whose distance from u is 

less than r(N). It has been shown in [11, 17] that if r(N) = 9 (y / rnlV/iv) , then the network 

is connected with high probability. This choice of r(N) corresponds to the connectivity regime. 
The communication algorithm used here is the slotted Aloha protocol — at any time step t each node 
transmits with probability p^ and if it transmits, it will transmit the current information that it holds. 
Multiple bits can be transmitted in each slot; the exact number will be determined in Section 3.2. 
The objective is to compute an estimate of the function at all the nodes. If pjy is chosen suitably 
then it can be ensured that the transmission from any node will be received by at least one of its 
neighbors with a constant probability, independent of N. See [12] for a more detailed discussion on 
such information spreading algorithms. 

Consider a time slot t in which node u transmits and it is received correctly by node v. Node u would 
be transmitting f u (t—l). Clearly, the data from the nodes that were used to compute f u (t — 1) would 
now determine f v (t). Let S u (t) be defined as before; the information spreading time, T 2 can also be 
defined similarly, i.e., 

T 2 := min{t : P [u^ =1 {|5„(t)| ± N}} < ft}. 

where the probability is over the randomness in the slotted Aloha protocol. It has been shown in [12] 
that if PA r = 9(1/ log N) and r N = 9 (yiniV/iV) , then 



• Percolating random planar radio networks: Connected RPRNs have transmission range r(N) = 

9 ^\/log N/N^j . This means that the average degree of a node is 9 (log N). Higher degree reduces 

spatial reuse factor, i.e., only 9 (N/ log N) nodes can transmit simultaneously. In [18] it is shown 
that choosing r(N) = 0(l/"v/iV) and suitably deleting a small number of nodes from a random pla- 
nar network yields a giant component with all the nodes having a constant degree. Also, the number 
of nodes in this giant component will be exponentially larger than the second largest component. In 
fact, r(N) can be chosen to ensure that the giant component has at least a specified fraction, say 
(1 — a) (where < a < 1), of the nodes. We will perform the computation in this giant component. 
Since the nodes which are not in the giant component do not participate in the computation, the 
computation is necessarily approximate. The analysis of this network will follow that of connected 
RPRNs very closely. We will not elaborate on this here. 
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2.2 Frequency moments 

Recall that if N m is the number of times m appears in the network then the fc-th frequency moment of x 

M 

is Fk := X) (^m) fe - A randomized algorithm to estimate F 2 is given in [1]. This works in the situation 

m— 1 

where there is a single processor. To design our distributed algorithm, we use the random maps from their 
algorithm. To make the description self-contained, we recall their algorithm: 



Algorithm 1 Streaming algorithm to compute F 2 



Input: x e {1, 2, ... , M} N , 4-wise independent maps fa,...,<(> ri : A -> {+1, -1} 
1: «- fa{x u ), 1 < i < ri and 1 < u < iV 

N 

2: s J <- £ yl, 1 < t < n. 
tt=l 

3: F 2 <- ^ 



The following theorem characterizes the performance of Algorithm 1 . 



Theorem 3. ( [1, Theorem 2.2] ) E (F 2 ) = F 2 , and Var(F 2 ) < 2F| and hence 



(1 - e)F 2 < F 2 < (1 + e)F 2 



> 1 



rie z 



Observe that s l = (N l + - Ni) = (N l + - (N - iV|)) = (2N l + - N), where iV| is the number of 
elements mapped to +1 under the map fa and TVi is the number of elements mapped to — 1 under the map 
fa. Hence to compute F 2 the algorithm requires only the number of elements mapped to +1. 



2.3 Exponential random variables 

Let X ~ exprand(a) be exponentially distributed random variable with mean 1/a. The probability distri- 
bution function corresponding to X, denoted as gx(x), is defined as: 



9x(x) = 




ifx > 
if a; < 



Fact 1. ( [13, Property 1 ]) Let Xi <~ exprand(ai) be independent exponential random variables with mean 
1/difor each i € {1, . . . N}. Let X = min^ =1 X t . Then X is also an exponential random variable with 

/ N \ - 1 
mean I X) a « ) 



3 Algorithm for Second Frequency Moment 
3.1 Algorithm 

Our algorithm has three parts. The first part consists of computations performed per node depending on 
its own data. In this part, first every node u maps its data x u to n random numbers . . . , y^ 1 } using 
independent random maps and then each of the y^'s are mapped to r 2 independent random variables. Thus 
each node u maps x u to r\r 2 random numbers as shown in Figure 1. The second part involves exchange of 
information across the network to compute a function {z^ 1 , . . . , z^ 1 ' 7 " 2 } of the random numbers generated 
in the first step. In the last stage the z u 's are first used to estimate intermediate estimators {N+, ■ ■ ■ , } 
and finally an estimate of F 2 is calculated as shown in Figure 2. The exact procedure is explained in 
Algorithm 2. 

The mapping of the elements of x using random maps fa are 4-wise independent in [1]. However, in 
our setting we can use independent random maps because we are not trying to optimize the number of 
bits stored per node. Rather, we are trying to optimize the number of bits transmitted per processor. The 
random maps can be thought of as global randomness shared by all the nodes. 
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Figure 1: Mapping to r\r 2 random variables in a node with data x u 
( 
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Figure 2: Estimating iV+ and F 2 



3.2 Error Analysis 



Let us examine the properties of F 2 obtained in Algorithm 2. If is exactly known then from Theorem 3 
the estimate of F 2 , defined as F 2 , can be written as: 



F 2 :=r^J2(2Nl-NY 



i=i 

From Theorem 3, we know that 



(1 - e 1 )F 2 <F 2 <(1 + e 1 )F 2 



> l-^a=:pi. (1) 



However, we do not know, rather cannot know, iV+ exactly for any i. In our algorithm, is a random 
variable that depends on the random map fc. Steps 3, 4 serve the purpose of estimating for the maps 
in Step 1, under the assumption that Step 3 has taken place without any error. However, recall that in 
point-to-point as well as random planar networks, Step 3 itself uses randomness. 

Error in Step 3 for point-to-point networks: We say that an error has occurred in Step 3, if 3u e 
{1, 2, . . . , N} such that |5„(Ti)| ^ N. Therefore, the probability of error is bounded by ft. 

Error in Step 3 for random planar networks: We say that an error has occurred in Step 3, if 
I So (T 2 ) | ^ N. Therefore, the probability of error is bounded by (3 2 . 

Assuming no error takes place in Step 3, we now analyze the error in F 2 . To finally bound the overall 
error, we trivially combine errors coming from different steps in the algorithm. 

Recall that z 1 ^ (T) is an exponential random variable. Assuming (T) is correct at time T, let us 

define Z % := ^ z% u^)- Conditioned on N+, we can use Chernoff bound analysis to show that for any 

j=i 
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Algorithm 2 Algorithm run by node u 



Input: x u e A, independent maps fa, . . . ,<j> ri : A — > {+1, — 1} 
!: Vu *~ M x u), i<i<n 

2: For each j e [1, r 2 ], z*'-' is chosen randomly and independently according to 



4 j (0) 



exprand«) ify*=l 
00 ifj/i = -l 



3: Depending on the information spreading algorithm, node u receives information from node v at time 
1 < t < T. On receipt of this information it updates as follows: 

z^(t)^mm{z^(t-l),z^(t-l)} 
4: LetiV;^r 2 ||^(T)j . 

5: A^r-r 1 E (27v;-iv) 2 . 

i=l v 7 



constant e e (0,1/2), 



This can be written as 



Z l 



Nl 



> 



Nl 



< 2exp 



2 

-e ■ r 2 



(i - e 2 )iv; < Yi < (i + £2 )iv; 



> f>2 



(2) 



where e 2 := 2e andp 2 := 1 — 2cxp {— ^frp) • 

Writing := i.e., iV^. is the estimate of and expanding F 2 , we have 

ri ri / / \ 2 

F 2 =rf 1 ^(2Ar;-A^) 2 = rf 1 X]( 4 (^+) ~ 4iViV| + N 



If (1 - e 2 )N l + <N l + < (1 + e 2 )N l + , then F 2 can be upper bounded as: 
F 2 < rf 1 ^ (^4(1 + e 2 ) 2 (iV;) - 4JV*.(1 - e 2 )7V + iV 2 J 

ri ri 

= rf 1 ^ (4 (N l + f - ANNl + N 2 ^j + rf 1 ^ (4e 2 ^;(e 2 ^; + 2N\ + JV)) 
i=l i=l 

< F 2 + Ae 2 N{e 2 N + 2N + N) 

< (l + e 1 )F 2 +4e 2 N(e 2 N + 2N + N) 

< F 2 + N 2 ^ + 4e 2 (3 + e 2 )) = F 2 + N 2 e 

where e = ei + 4e 2 (3 + £2)- Similarly we can lower bound F 2 as: 

F 2 > F 2 - eiV 2 . 

Combining all the three error probabilities i.e., p\, (1 — /3), P2 corresponding to maps, information 
spreading algorithm and exponential random maps respectively, we get, 



F 2 F 2 F 2 

7T 2 ~ e " iV 2 " TV 2 



>f>ip 2 (l-/3). 



Note that e depends on t\ and e2 which can be chosen arbitrarily small. Also p\ and p 2 depend on (ei, ri) 
and (e 2 , r 2 ) respectively. Thus these can also be made arbitrarily small by suitably choosing n and r 2 . (5 
can also be made arbitrarily small by suitably choosing T. 
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The space analysis for each node is fairly standard. We include it here for the sake of completeness. 
Each node transmits a vector of length r\r 2 . Each entry of this vector is an exponential random variable. 
Let us assume that s bits suffice to store the exponential random variables. 1 When node u receives the vec- 
tor of v, it computes the coordinate-wise minimum of the r\r 2 -element vector. Only this minimum is stored 
at every node and transmitted at each time the node is activated. This suffices because the "min" function 
is unaffected by the sequence in which the different nodes are heard and also if a node is heard multiple 
times. If s bits suffice for storing exponential random variables, then each node transmits 0(rir 2 s) bits. 
s is determined below by suitably truncating and quantizing the z^(t), which are exponential random 
variables. 

We will show below that 0(log N) bits of precision suffice to store z^(t). This will allow F 2 to be 
estimated within the same factor of approximation with an additional small error. We thus modify only 
Step 2 in Algorithm 2 as follows: 

Algorithm 3 Modified step 2 of Algorithm 2 with quantized random variables 
For 1 < j < r 2 , generate z^ (0) by the following rule until (0) < L 

ij/frt , /exprand(2/i) ify*=l 
Uniformly quantize z^(0) using B bits. 



The other steps of the Algorithm 2 remain unchanged. If the maximum relative error in estimating N\ 
due to truncation is \x and L and B are both chosen as 9 (log N), then the estimate of F 2 is, following the 
analysis of [2], 

r F 2 



< El < El _l 

iV 2 e ~ N 2 ~ N 2 



> 1-5. 



Here, e = ei + 8/x(3 + 2/x) and 5 



(1 



'). This means that in gossip networks at 



each step a node will transmit rir 2 6(log N) bits. Further this also tells us that the each slot of the slotted 
Aloha protocol should be rir20(log N) bit periods. 
We have thus proved Theorem 1 . 



Percolating RPRN 

Let us now consider the computation of the estimate of F 2 in percolating RPRN except that a fixed fraction 
of the data is missing, i.e., N a := (1 — a)N of the nodes have participated in the computation of F 2 . Let 
F 2 ^ a be the second frequency moment calculated from N a nodes. Let rrn be arranged in the descending 
order as > m i2 . . . > m iM . It is easy to see that the difference between F 2 ^ a and F 2 will be maximized 
when the nodes that are removed had value i\. Therefore, 

M M 

F 2 , a < {m n - aN) 2 + to? < - 2aNm il + a 2 N 2 . 

3=2 3=1 



If TOjj > aN, 



F 2a <F 2 ~ 2a 2 N 2 + a 2 N 2 < F 2 - a 2 N 2 . 



(3) 



If TO^ < aN, then similar calculations can be performed to get the same bound. Let F 2jCt be the output of 
Algorithm 2 applied on N a nodes. Then by Theorem 1 we know, 



\F 2>a -F 2 , a \ > (l-a) 2 N 2 e 



Using equations 3 and 4, we get, 



\F 2 , a -F 2 \>N 2 a 2 (l-e) 



< 5 



< S 



(4) 



(5) 



Equation 5 is applicable to the networks which operate in percolation regime where a constant fraction of 
the nodes do not take part in function computation. Therefore, we have the following corollary, 



1 Then oo in the algorithm can be represented using s + 1 bits. 
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Corollary 1. For all constants e, S € (0, 1), there exist ri , r 2 = poly(e 2 , log 5 1 ) so that there is a ran- 
domized algorithm that runs in time 0(T), uses 0(rir 2 log N) bits of transmission per step and computes 



an estimate of say as 



Fi.a — Fi\ > N 2 a (1 — e) < 5. Here T is the time needed by the 



spread algorithm and a is the fraction of the nodes which are not in the giant component. 



3.2.1 Second frequency moment using bottom-r 2 sketch 

In Algorithm 2, for each node u and 1 < i < n, y l u is mapped to r 2 independent random variables. Let 
V* :— (z^lT), . . . , zY 2 (T)), 1 < i < n denote the vector computed by the node u after time T. Note 
that each V* is an r 2 sized vector each element of which is the minimum of N independent exponential 
random variables. V£ is also known as r 2 — mins sketch in the literature [5]. Recall from Section 3.2 that 
each z^iT) is an exponential random variable with mean 1/N+ and thus is used to estimate N l + . Recall, 
r 2 e O(l), therefore it is asymptotically very small. However, in practice, reducing it to a small constant 
will help in reducing the amount of randomness used by the algorithm, and may help in bringing down the 
number of bits transmitted per node. 

We observe that r 2 can be reduced to 1. This certainly helps in reducing the number of random maps 
used per node. However, because of the manner in which the final estimate is computed, we do not see 
any way of saving the number of bits transmitted per node. We use bottom-^ sketch as defined in [5]. For 
each 1 < i < n, we map y l u to a single exponential random variable z % u . For a fixed i, arrange z^'s in 
a non-decreasing order, say z\ < z\ . . . < z\ . It is shown in [5] that using r 2 smallest values, a good 
estimate can be computed. Therefore, in our case it will suffice if each node knows these r 2 minimum 
values. Each node can do this by keeping track of the r 2 smallest values seen so far for each i. This can be 
done by the following book keeping: Node u holds a vector V£ := [z l u , oo, . . . oo), for each 1 < i < n. 
Node u communicates the vector V£ to node v and updates its vector by appropriately inserting the values 
from V* to get the first r 2 minimum values available in the network for each i. At the end, for each u 
we have V^(l) < V^(2) . . . < V^(r 2 ) representing the r 2 lowest values of the network. To summarize, 
each node can estimate by generating only one exponential random variable instead of r 2 independent 
random variables. However, to compute the bottom-r 2 sketch for estimating F 2 , each node transfers rir 2 
numbers, i.e., 0(rir 2 log TV) bits of transmission per processor. 



4 Algorithm for Higher Frequency Moments 

In this section we present an algorithm to compute frequency moments F^, for all k > 3. In the data 
streaming literature, many algorithms are known for computing Fk- (See for example [1,6, 8].) In [1], 
sampling is used for estimating Fk for k > 3. For the special case of k = 2, they give a sketching 
algorithm as shown in Algorithm 1. On the other hand, [6, 8] use sketching algorithms for estimating Fk. 
The map <f> in Algorithm 1 can be thought of as a map from the input alphabet to the square roots of unity. 
A possible generalization of this for k > 3 is a map from the input alphabet to fc-th roots of unity. In [8] it 
was proved that maps from the input alphabet to fc-th roots of unity can be used for estimating Fk . In order 
to estimate Fk in our setting, we use a combination of random maps to fc-th roots of unity and exponential 
random variables. Our primary observation is that Fact 1 helps us compose exponential random variables 
with the maps to fc-th roots. This composed map in turn helps in estimating Fk for fc > 3 in all the three 
models of distributed networks. In order to explain the central idea used in estimating Fk,k>3,we give 
a simplified version of our original algorithm: 

Note the similarity between Algorithm 4 and Algorithm 2. The above algorithm is overly simplified. 
It was observed by [8] that sum of y^'s when raised to power fc has expectation equal to Fk, however its 
variance is very large. This problem was resolved by using a bucketing strategy. For each node u, x u is 
mapped to one of {1, 2, ... , B} buckets using si different maps: xi,X2, ■ ■ ■ , Xsi : A —> {1,2, . . . , B}. (In 
our setting we can use independent random maps because we are not trying to optimize the amount of bits 
stored per node. We only try to optimize the number of bits transmitted per node. The random maps can 
be thought of as global randomness shared by all the nodes.) It was proved that B ■ s\ = 0{M^ — ) [8]. 

The error analysis of the algorithm can be done in the same way as done for F 2 in Section 3.2. It can 
be shown that 



Fk Fk Fk 



>P, 

where e is a function of fc, M and errors due <f>, \ an d exponential random variables. 
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Algorithm 4 Algorithm for higher frequency moments run by node u 



Input: x u e A 0i, . . . (fin ■ -A -> {ai + 0u . . . , + «/3 fe }, where + i# = e 2 ™'/fc. 

!: «- <M^«), i < p < n 

2: If yP — a + i/3 then for 1 < q < r 2 



z a%(°) *~ exprand(a + 1) 
z $' 9 u(°) <" exprand(/3 + 1) 



3: Depending on the information spreading algorithm, node u receives information from node v at time 
step 1 < t < T. On receipt of this information it updates as follows: 



Similar to the result of F 2 in Section 3.2, p is a function of n , r 2 and errors due to 4>, \ an d exponential 
random variables. As seen earlier, e can be made arbitrarily small by controlling the errors due to the ran- 
dom maps and similarly (1 — p) can be made arbitrarily small by choosing n , r 2 appropriately. Proceeding 
on the lines of proof of Theorem 1 we get Theorem 2 from here. 

5 Discussion 

• In this paper we have considered one-shot computation of Fk ■ For random planar networks, it is also 
of interest to develop algorithms to compute Fk for the sequence of x, the data vector. In this case 
the computation of Fk for the different elements of the sequence will be pipelined. The techniques 
of [12, 18] easily extend to this case. 

• Sketching is a commonly used technique in dealing with massive data sets. It involves mapping the 
given data from a large alphabet into a relatively smaller alphabet preserving the relevant properties. 
Let / : {1, 2, . . . , M} n -> {1, 2, . . . , M} be a function and let 4> : {1, 2, . . . , M} -> {1, 2, . . . , fc} 
denote a map used by a sketching algorithm to compute /. (Note, M >> k). For 1 < i < k, 
let N[ denote the number of elements mapped to i under the mapping cf). Suppose for every input 



x = {x u x 2 , ■ ■ ■ , x n ) e {1,2,..., M} n , f(x) can be estimated using N* (x) ,N$ (X> , ...,N% 



then we call / to be sketch type sensitive. That is if / is sketch-type sensitive, / essentially depends 
on a type-vector, i.e., a vector of length k, with each entry 1 < i < k in the vector corresponding 
to the number of elements of the original alphabet mapped to i. F 2 is one such function. As noted 
in Section 3, F 2 = (2N + — N) 2 . In fact Vfc > 2, Fk is sketch type sensitive. We believe that our 
techniques can be used for estimating any sketch type sensitive function. 

• We compute the estimate of the scaled version of Fk, i.e., Fk/N k . It will be interesting to estimate 
Fk itself. Also, in our algorithm, we assume that all nodes know sketching functions fa's. An 
algorithm that does not assume such shared randomness will be an improvement over our algorithm. 

• This work shows that the techniques developed for space efficient algorithms to compute functions 
of streaming data can be used to reduce the communication in distributed computing of functions of 
distributed data. This connection needs to be explored further. 
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